University of Konstanz, Team Apocalypse

VAST 2011 Challenge
Mini-Challenge 1 - Characterization of an Epidemic Spread

Authors and Affiliations:

Juri Buchmüller, University of Konstanz, juri.buchmueller@uni-konstanz.de
Fabian Maaß, University of Konstanz, fabian.maass@uni-konstanz.de
Stephan Sellien, University of Konstanz, stephan.sellien@uni-konstanz.de
Florian Stoffel, University of Konstanz, florian.stoffel@uni-konstanz.de
Matthias Zieker, University of Konstanz, matthias.zieker@uni-konstanz.de
Enrico Bertini, University of Konstanz, enrico.bertini@uni-konstanz.de
Christian Rohrdantz, University of Konstanz, christian.rohrdantz@uni-konstanz.de
Tobias Schreck, University of Konstanz, tobias.schreck@uni-konstanz.de

 

Tool(s):

In the first phase of our analysis we used KNIME [1] and Tableau [2] to get a first impression of the data. After our initial investigations however we realized KNIME and Tableau did not fully serve our purposes: the sheer size of the data made data manipulation and interaction cumbersome; the limited integration of analytics and visualization made the whole process too complicated; and the limited integration of geographic, temporal and linguistic features hindered the analytical process considerably. For this reason we developed an interactive and flexible tool able to accommodate all these elements in a single environment.

We called the tool EVA (Epidemic Visual Analyzer). EVA seamlessly combines maps, time-series and tag-clouds and allows for a number of flexible filter and query operations to isolate patterns of interest. In addition, in order to detect complex spatio-temporal trends and textual patterns, it provides robust data clustering, advanced image processing routines, and text analytics operations.

EVA uses the following libraries: Lucene [3] to provide full text search and advanced query capabilities, the Java 6 platform to display and manipulate the data, the Java imaging library nicejava [4] for image processing capabilities, the freely available IBM Word-Cloud Generator [5] for tag-clouds.

 

1: http://www.knime.org
2: http://www.tableausoftware.com
3: http://lucene.apache.org
4: http://code.google.com/p/nicejava/
5: http://www.alphaworks.ibm.com/tech/wordcloud

Video:

Answer Video

ANSWERS:


MC 1.1 Origin and Epidemic Spread: Identify approximately where the outbreak started on the map (ground zero location). If possible, outline the affected area. Explain how you arrived at your conclusion.

We found two separate epidemic outbreaks: pneumonia-related and diarrhea. The first one has its origin at the Vastopolis Dome and the Convention Center, spreading to the east, including parts of Eastside (fig. 1.1.1). The second one relates to a truck accident on the interstate 610 bridge over Vast River, where substances spilled into the nearby area. The contamination caused diarrhea symptoms spreading from the bridge to the southwest down the river (fig 1.1.2).

 

We found the outbreak locations by: (1) isolating users sending messages related to symptoms; (2) searching for anomalous peaks in message density (high volume in a restricted area in a short time); (3) visually inspecting them with tag-clouds; (4) and relating the location of the events to the affected areas.

 


Figure 1.1: Downtown/Uptown, pneumonia symptoms, blue outline marks the outbreak area.

 


Figure 1.1.2: Lower Vast River, diarrhea symptoms, blue outline marks the outbreak area.

 


MC 1.2 Epidemic Spread: Present a hypothesis on how the infection is being transmitted. For example, is the method of transmission person-to-person, airborne, waterborne, or something else? Identify the trends that support your hypothesis. Is the outbreak contained? Is it necessary for emergency management personnel to deploy treatment resources outside the affected area? Explain your reasoning.

We mapped each single microblog entry as a (transparent) red point on the map at its corresponding location and analyzed visually how their distribution changed from one day to another (flipping through the days, as shown in the video). In this way, were able to identify dense areas and changes in time, like for instance an abnormal high message density around the hospitals on May 20 (fig. 1.2.1). Assuming the majority of people sending messages from the hospitals are those who have been affected by the disease, we isolated the messages these people send during the days before May 20 to see whether we could observe any significant trend.

 

Figure 1.2.1: Messages on May 20, hospitals are outlined blue

 

Assuming people in the hospitals write about symptoms related to the epidemic, we searched through their messages to create a word list to use as reference to identify sick people. Then we isolated those users who in the given time span use at least three times one of the words in the reference list and kept only the messages sent by these users (note that this filtered set does not contain only messages with disease-related words). At this point, by flipping through the days, we noticed a strong funnel-shaped trend on May 18 and 19 (fig. 1.2.2), suggesting the airborne transmission of a disease (fig. 1.1.1, outlined area). This hypothesis is supported by the wind direction during these days. A transmission from person-to-person seems also possible, because more and more people start to write about the symptoms all over Vastopolis on May 19.

 

Figure 1.2.2: Downtown/Uptown, May 18, 19, 20

 

 

Inspecting the messages with tag clouds we realized that pneumonia related symptoms take place mainly in Downtown and Uptown on May 18, 19 and 20 (fig. 1.2.2). In order to find an explanation for this trend, we searched for abnormal high message densities in the nearby before those days. In Downtown, we found some events (e.g., a bomb threat), but due to various reasons (for example no involvement of hazardous substances) we believe they are not connected to the epidemic spread. When analyzing the two main outbreak locations on May 18 (fig. 1.2.3), we found that a basketball game took place in the Vastopolis Dome and a technology convention in the Convention Center. The large amount of people around the hospital is most likely caused by the outbreak at these two locations.

 

Figure 1.2.3: Finding the events on May 18.

 

People attending those events started to write on May 18 about having a shortness of breath. The symptoms worsen over time and on May 19 and May 20, severe cases of pneumonia can be found. This is also the day when people start gathering in the city’s hospitals. We therefore believe that an intentional release of a pathogen happened during the above events.

 

Figure 1.2.4: Lower river area showing only diarrhea related messages on May 18 and 19.



While inspecting the main trend found in the central area with interactive tag clouds, we noticed how the messages sent from the area around the lower part of the river contained a high frequency of words related to diarrhea in place of pneumonia. At this point we isolated those messages containing words related to diarrhea and again by flipping though the days we noticed a strong temporal pattern. From April 30 to May 19 almost none of these messages can be found, however on May 19 suddenly those messages appear (fig. 1.2.4). Since drinking water is pumped out of reservoirs and nearby rivers, a waterborne transmission seems likely and is strongly supported by the shape of the affected area (Fig. 1.1.2, outlined area).

 

When searching for a possible cause we used a density-based anomaly detection algorithm. The algorithm isolated three potentially related dense areas, which we inspected with our interactive tag clouds.

 

Figure 1.2.5: Events along the Vast River.

 

The first two events, an explosion in a factory in southern Smogtown (fig 1.2.5) and a car accident on the bridge of the Interstate 270 (fig. 1.2.5) on May 17, are very unlikely to be related to the epidemic outbreak because of their position on the lower part of the river. These two events, given the direction of the river, won’t explain the shape and extent of the affected area. On the same day, few hours later, we found a truck accident above the outbreak area on a bridge of the Interstate 610 (fig. 1.2.5). An analysis of the messages near the accident showed that at least one truck spilled some substances which may have contaminated the river. The delay between the crash and the symptoms might be caused by the flow rate of the river or by water reservoirs in-between.

 

All hospitals, particularly the Vastopolis City Hospital, are overwhelmed with patients on May 20. Using the capacity of hospitals in equally-sized cities as reference (e.g., Hamburg), we estimated that the size of hospitals in Vastopolis can be just enough to treat all people which blog about their sickness. Since the number of affected people is probably much larger than those who blog, it is evident that the hospitals will not be able to treat all of them.


To be able to treat all sick people in Vastopolis the capacities outside the affected areas have to be expanded, given the spread of the epidemic from the center to surrounding (fig. 1.2.1). According to the growth rate of the last days, we would also recommend to increase the stationary capacities in Uptown and Downtown. It might also be necessary to check the water supplies in the lower river area.